Regulatory nucleotide sequence of the initiation of transcription

ABSTRACT

The invention relates to a recombinant nucleotide sequence, characterized in that it comprises: 
     a regulatory sequence of the initiation of transcription, this regulatory sequence containing a promoter in association with the motif GCACTC 9N GAGTGC, in which &#34;N&#34; signifies any one of the 4 bases thymine, guanine, adenine and cytosine; 
     a sequence coding for a polypeptide, called &#34;heterologous polypeptide&#34;, which is different from that naturally associated with the promoter; the coding sequence being positioned downstream from the regulatory sequence of the initiation of transcription at a site which, under suitable conditions, would allow the expression of the polypeptide under the control of the promoter.

This application is a continuation of application Ser. No. 08/050,313,filed as PCT/FR91/00701 Sep. 3, 1991, published as WO92/04452, Mar. 19,1992, abandoned.

BACKGROUND OF THE INVENTION

(i) Field of the Invention

The invention relates to a regulatory nucleotide sequence of theinitiation of transcription and its use in the production ofpolypeptides by the recombinant approach. The technical problem whichpresented itself when the present invention was being developed was toidentify a strong and, if possible, thermoinducible promoter, whichcould be used in a large number of micro-organisms and, in particular,in the Actinomycetes.

(ii) Description of Related Art

The Actinomycetes constitute a bacterial order of great economic andmedical importance, mention needs only to be made of the fact that theActinomycetes include, in particular, the Streptomyces and Mycobacteriumgenera.

The Streptomyces are used for the production of about 70% of theantibiotics sold today; furthermore, even though there is no longeroccasion to describe the ravages caused by Mycobacterium tuberculosisand Mycobacterium leprae, great interest is attached to the expressionof the heterologous antigen in a strain of M. bovis BCG in order toproduce a living polyvalent strain to be used as a vaccine.

The genetics of these bacteria has been little studied at the molecularlevel, but the regulation of genetic expression seems to be differentfrom that which has been described for bacteria which have been studiedin more detail such as Escherichia coli or Bacillus subtilis.

In particular, the Actinomycetes, the DNA of which is rich in G+C,recognize the "promoter" sequences in Escherichia coli and Bacillussubtilis, poorly or not at all. Baird et al. (J. Gen. Microbiol.1989,135, 931-939) studied the genes coding for heat shock proteins inMycobacterium and postulated that two sequences, TTGAG and TCTCATGT,located upstream from the sequence coding for the heat shock protein of10 kDA of Mycobacterium tuberculosis, constitute the -35 and -10 regionsof the promoter.

In fact, the two sequences exhibit a high degree of homolgy with theconsensus sequence of the promoter in E. coli. Furthermore, sequenceslocated upstream from the genes coding for other heat shock proteins invarious species of Mycobacterium (for example, the protein of 65 kDa ofMycobacterium leprae or that of 64 kDA in M. bovis) also contain acouple of sequences of the same type, i.e.: TFGCCG and TTTCAT, or TTGCCGand CTTCAT, and which thus show a high degree of homology with the E.coli promoter and with the "-35 and -10" sequences of the 10 kDa speciesof Mycobacterium tuberculosis.

According to that article and the one by Thole et al. (Infection andImmunity, 55, 1987, 1466-1475), the promoters responsible for thetranscription of the genes of the heat shock proteins in Mycobacteriumcontain, as -35 and -10 sequences, this type of coupled sequences. Thesesequences will be designated subsequently as "-10 and -35 sequences ofthe E. coli type". This being so, the authors had not confirmed thishypothesis by mapping and the identity of the promoters consequentlyremains uncertain.

In a surprising manner, the present inventors have noticed that the -10and -35 sequences of the E. coli type are also present in Streptomyces,but do not play a role in the initiation of the transcription of heatshock proteins in the bacteria of this genus. It should be noted that,in the articles mentioned above, Baird and Thole had also observed thepresence, in sequences upstream from the genes coding for the heat shockproteins in Mycobacterium, a palindromic motif containing the sequence(SEQ ID NO:1) GCACTC 9N GAGTGC. However, the precise role of this motifhad not been identified. Thole postulates that this motif is implicatedeither in the termination of transcription of an operon located upstreamfrom the gene coding for the heat shock protein or in the regulation ofthe translation of a polycistronic messenger.

With the aim of cloning a strong (and, if possible, thermoinducible)promoter, which can be used in a large number of Actinomycetes, theinventors have studied the response to heat shock in Streptomyces andhave cloned a strongly expressed protein in order to characterize itspromoter. In fact, the response phenomenon to heat shock resulting inthe de novo of proteins is a universal phenomenon; it may thus beanticipated that its regulation will be similar in various Actinomycetesand in particular that the promoters will have consensus sequences whichwill enable them to be used in a large number of strains.

SUMMARY OF THE INVENTION

The inventors have thus identified and characterized two functionalpromoters in the Streptomyces and, in addition, have been able toidentify the function of the said palindrome as a consequence of theobservation that each of these promoters contains the motif (SEQ IDNO:1) GCACTC 9N GAGTGC.

The invention relates to a recombinant nucleotide sequence characterizedin that it comprises:

a regulatory sequence of the initiation of transcription, thisregulatory sequence containing a promoter in combination with the motif(SEQ ID NO:1) GCACTC 9N GAGTGC in which "N" signifies any one of the 4bases : thymine, guanine, adenine and cytosine;

a sequence coding for a polypeptide, called "heterologous polypeptide",other than that naturally associated with the said promoter;

the said coding sequence being positioned downstream from the saidregulatory sequence of the initiation of transcription at a site which,under suitable conditions, would permit the polypeptide under thecontrol of the said promoter to be expressed.

The recombinant nucleotide sequence of the invention is capable ofgiving rise to the expression of a heterologous gene in a cellcontaining it.

The regulatory sequence of the initiation of transcription, which formspart of the recombinant nucleotide sequence, is composed of a promoterin association with the motif (SEQ ID NO:1) GCACTC 9N GAGTGC. In thiscontext, "in association" means that the motif (SEQ ID NO:1) GCACTC 9NGAGTGC may be distinct from the promoter or it may be contained, atleast in part, within the promoter. This latter possibility includes thesituation in which the motif overlaps the promoter sequence and thesituation in which the motif constitutes an integral part of thepromoter.

Promoters associated with the sequence (SEQ ID NO:1) GCACTC 9N GAGTGCwhich are particularly preferred according to the invention are thepromoters present in the Actinomycetes and, more particularly, thosewhich in the bacterial genome are normally associated with heat shockproteins. As examples of this type of promoter, mention may be made ofthe promoters of heat shock proteins of 18 kDA (P1) and 56 kDa (P2) inStreptomyces albus identified within the framework of the invention bythe inventors:

P1 corresponding to one of the sequences (SEQ ID NOS. 2-3): ##STR1## P2corresponding to one of the sequences (SEQ ID NOS. 4-5): ##STR2## Eachof these promoters may be shortened by a maximum of 4 or 5 bases at the5' end without adversely affecting its activity. Other types ofpromoters are those for heat shock proteins of 10 kDa and 65 kDa fromMycobacterium tuberculosis, 64 kDa from M. bovis and 65 kDa from M.leprae.

The association of the GCACTC 9N GAGTGC motif with the promoter confersthermoinducible character on the promoter. It is probable that thissequence is an operator and constitutes the binding site for arepressor, which thus prevents the RNA polymerase from binding to the-10 and -35 sequences of the promoter. In the case in which the motif isdistinct from the promoter, it is preferably upstream from the promoter,by about 150 to 200 bases, for example.

The recombinant sequence of the invention contains, in addition, to theregulatory sequence of the initiation of transcription, a sequencecoding for a polypeptide called "heterologous polypeptide", differentfrom that which is naturally associated with the said promoter. Thus,the immediate genetic environment of the promoter is different from thatin the genome from which it is derived. As examples of types ofheterologous polypeptides, mention may be made of neutralizing antigenswhich can be used in the production of recombinant live vaccines orpolypeptides conferring resistance to an antibiotic, enzymes, etc. . . .

In the nucleotide sequence of the invention, the coding sequence ispositioned downstream from the said regulatory sequence. Their relativepositions are, of course, such that the expression of the codingsequence takes place under the control of the promoter.

In addition, the invention relates to an expression vactor containingthe nucleotide sequence of the invention, for example a plasmid.

The invention also relates to a cell transformed by this expressionvector, the said cell being capable of recognizing the promoter used inthe regulatory sequence of the initiation of transcription. Thetransformed cells are preferably prokaryotic cells and, moreparticularly, prokaryotes belonging to the order of the Actinomycetes,for example Streptomyces or Mycobacterium.

The invention also relates to a procedure for the production of apolypeptide characterized in that it comprises the following steps:

transformation of a cell by an expression vector according to theinvention under conditions allowing the expression of the saidpolypeptide, the said cell being capable of recognizing the saidpromoter;

recovery of the polypeptide expressed.

The conditions allowing the expression of the polypeptide are thoseknown from the prior art and in the present case, preferably include theuse of heat shock, which has the effect of inducing expression. The heatshock may be an increase in temperature from about 37° C. to 45° C. or,in particular, 40° C. to 45° C., for example 37° C. or 41° C. inStreptomyces, 42° C. to 45° C. in the case of Mycobacterium.

It is interesting to note that the use of the promoters P1 and P2 of theinvention results in a sustained expression of the heterologous proteinat high temperature, for example between 37° and 41° C. in Streptomyces.

Another feature of the invention relates to the possibility oftransforming a promoter into a thermoinducible promoter as a result ofits association with the (SEQ ID NO.1) GCACTC 9N GAGTGC motif. Moreparticularly, this feature of the invention relates to a procedure forconferring a thermoinducible character on a promoter, characterized bythe juxtaposition of a sequence containing the (SEQ ID NO.1) GCACTC 9NGAGTGC motif, on the one hand, and the promoter, on the other, thesequence containing the (SEQ ID NO.1) GCACTC 9N GAGTGC motif beingpositioned upstream from the promoter, or by insertion of the sequencecontaining the (SEQ ID NO.1) GCACTC 9N GAGTGC motif at a site which is,at least in part, contained within the promoter, this latter site beingselected such that the simple insertion of the said palindrome does notperturb the activity of the promoter. The precise position in which the(SEQ ID NO.1) GCACTC 9N GAGTGC motif must be placed with respect to thepromoter in order to be able to confer thermoinducible character mayvary depending on the promoter used. This can be checked by detecting,on application of a heat shock, the expression of an easily detectableheterologous gene, for example a gene marker such as LacZ, in a celltransformed by the construction under test. The positioning of the (SEQID NO.1) GCACTC 9N GAGTGC motif at a site about 150 to 200 basesupstream from the promoter can confer this thermoinducible character. Insome cases, the (SEQ ID NO.1) GCACTC 9N GAGTGC motif may be inserted ata site which is, at least in part, contained within the promoter. Insuch a case, the insertion site must be selected such that the simpleinsertion of the motif does not perturb the activity of the promoterother than that due to the introduction of the thermoinducible effect.It is important not to modify the -10 and -35 sequences of the promoterwhen this insertion is made. The thermoinducible character of theregulatory sequence of the initiation of transcription thus produced maybe checked by applying the method described above. While they werestudying the promoters, the inventors studied the response to heat shockof various species of Streptomyces. In addition to the principal heatshock proteins with molecular weights of 90-100, 70 and 56-58 kDa, aprotein of 16 to 18 kDa was observed in each of the species tested. Thisprotein (called HSP18) is produced at very high levels in Streptomycesalbus when the culture is transferred from 30° to 37° C. and mayconstitute up to 10% of the total proteins. The induction by means ofbeat shock of the proteins of 70 and 90-100 kDa is transient, whereasthat of the proteins of 56-58 kDa and 18 kDa is constitutive, theproduction being sustained at high temperatures.

The protein called HSP18 was purified and characterized. Its propertiesare unlike those of other heat shock proteins. For example, apart fromits relatively small size and its being regulated constitutively at hightemperature, it possesses an isoelectric point higher than 9. This veryhigh isoelectric point is, however, not reflected in its amino acidcomposition (see Table 2).

Furthermore, the determination of its amino acid composition revealed arather low methionine content, which is not consistent with the resultsof /35S/ methionine incorporation experiments (see Table 1). Theseobservations suggest that the HSP18 protein undergoes modification whichtakes place after translation. The HSP18 protein does not react withpolyclonal antibodies against the GroEL protein from E. coli, nor withmonoclonal antibodies specific for the 65 kDa heat shock protein fromMycobacterium leprae.

A study of the transcription of the "groEL-1" gene coding for the HSP18protein showed that HSP18 is, in fact, a truncated protein. The groEL-1gene codes in reality for a protein of 56 kDa which is modified aftertranslation and gives rise to the 18 kDa protein.

FIG. 6 shows the partial sequence of the groEL-1 gene and its amino acidtranslation product. This sequence corroborates the sequences determinedby Edman degradation of HSP18. The sequence shown in FIG. 6 lacks theCOOH terminus of the 56 kDa protein. The sequence of the 18 kDa proteinis included in this parent sequence, their two NH2 termini beingidentical (amino acid No.1). HSP18 extends maximally up to about aminoacid 170.

FIG. 8 gives the complete sequence of the protein encoded by the groEL-1gene, which comprises the HSP18 protein, just like the figure shown inFIG. 6. The invention relates to a heat shock protein comprising either(i) the amino acid sequence shown in FIG. 6 or the sequencecorresponding to the amino acid sequence shown in FIG. 8, or (ii) asequence exhibiting at least 85% homology with this sequence, or (iii) apart of the sequence (i) or (ii) comprising the NH2 terminus andextending up to about amino acid 170, the polypeptide (iii) having amolecular weight of about 18 kDa and a very basic isoelectric point ofabout 9.

By analogy with the function of other proteins of the GroEL type, it isprobable that HSP18 is essential for the survival of the cell and playsa role of "molecular chaperon", i.e. binds transiently to nascentpolypeptides preventing the aggregation of insoluble proteins and makingfolding and transport through the cell membrane possible. It is alsopossible that HSP18 is implicated in the resistance of the strain to itsown antibiotics or in tolerance to heat.

In accordance with a special feature, the invention also relates to apolypeptide containing the COOH terminal region of the GroEL-1 proteinas shown in FIG. 8. A particular polypeptide corresponding to thisdefinition contains or corresponds to the following amino acid sequence:Gly His Gly His Gly His Ser His.

The amino acid sequence described above corresponds to an originalsequence of amino acids when compared with the COOH terminal peptidesequences known for heat shock proteins. This COOH terminal sequencemight be implicated in the formation of the truncated 18 kDa protein.

BRIEF DESCRIPTION OF THE DRAWINGS

Various features of the invention are illustrated in the figures:

FIG. 1 shows schematically the cloning of the groEL-1, groES and groEL-2genes. The sites which have served for the construction of the plasmidspPM1005 and pPM997+Neo are shown in brackets.

FIG. 2 (SEQ ID NOS. 1-6) shows the P2 promoter of groEL-2 and the P1promoter of groES and groEL-1.

FIG. 3 shows the vector pPM1005 containing the neo gene of TN5 under thecontrol of the SmaI fragment of 440 bp of Streptomyces albus. Thisfragment contains the P1 promoter and the first 160 base pairs of thegroES gene.

FIG. 4 shows the vector pPM997+Neo containing the neo gene of Tn5 underthe control of the BglII/SstI fragment of 800 bp of Streptomyces albus.This fragment contains the P2 promoter and the first 183 bp of thegroEL-2 gene.

FIG. 5 (SEQ ID NO.7) shows the nucleotide sequence and the amino acidsequence deduced from the groES structural gene and the GroES protein.

FIGS. 6A-C (SEQ ID NO.8) illustrate the nucleotide sequence as well asthe deduced amino acid sequence of the protein precursor of HSP18.

FIGS. 7A-B (SEQ ID NO.9) show the nucleotide sequence of the gro es eloperon together with its promoter sequence.

FIGS. 8A-D (SEQ ID NO.10) show the complete amino acid sequence encodedby the groEL-1 gene with which it is aligned.

FIG. 9A-B (SEQ ID NO.11) shows the nucleotide sequence of the completegroEL-1 gene.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS EXAMPLES

Effect of Temperature on Protein Synthesis in Streptomyces:

The total protein extracts of 15 different species of Streptomyces wereprepared before and after application of a heat shock (increase intemperature from 30° C. to 41° C.). Major heat shock proteins of 90-100,70 and 56-58 kDa were detected on a SDS-PAGE gel, stained with Coomassieblue. In addition, a molecular band corresponding to a protein of 18 kDawas also observed in some species. This protein was very stronglyinduced in Streptomyces albus.

Immunological Properties of the Proteins:

A "Western blot" analysis of the proteins on the gel with monoclonalantibodies against the 65 kDa heat shock protein of Mycobacterium lepraeand with polyclonal antibodies against the GroEL protein of E. coli wascarried out. None of the proteins reacted with the monoclonal antibodiesand only the 56-58 kDa HSPs reacted with the polyclonal antibodies.

Study of the Heat Shock Proteins in Streptomyces albus:

The response to heat shock in Streptomyces albus was analysed by meansof electrophoresis of the total proteins at 30° C. and 41° C. Theproteins were labelled for 40 minutes. The amino acids used were /35S/methionine and /14C/ alainine. It was possible to visualise four majorheat shock proteins. HSP90, which could not be detected at 30° C.,represented about 3% of the total proteins after heat shock. The amountof HSP70 was at least doubled. HSP56-58 showed an increase of 30% andHSP18, which could not be detected before heat shock, represented 4 to7% of the total proteins after the shock (see Table 1).

                  TABLE 1    ______________________________________    QUANTIFICATION OF THE LEVELS OF SYNTHESIS OF THE    MAJOR HEAT SHOCK PROTEINS AFTER LABELLING WITH 14C    ALANINE AND 35S METHIONINE           Level of synthesis (a) at 30° C. and 41° C.    Protein MW(b)             35S/30° C.                       35S/41° C.                                 14C/30° C.                                         14C/41° C.    ______________________________________    90       c         3         --      2.9    70       2.7       6.4       2.3     5.8    56-58    6.8       8.4       6.2     9.0    18       >0.7?     6.9       >0.7?   3.8    ______________________________________     a. expressed as a percentage of the total optical density (O.D.) measured     by means of autoradiography.     b. apparent molecular weight in kDA.     c. not detected.

Study of the HSP18 of Streptomyces albus:

The 18 kDa protein of Streptomyces albus (HSP18), which is extremelybasic, was purified and its partial amnino acid composition wasdetermined (see Table 2):

                  TABLE 2    ______________________________________    AMINO ACID COMPOSITION OF HSP18            Asx  12.2            Thr  9.4            Ser  3.2            Gix  10.8            Ala  12.1            Cys  0.0            Met  0.0            Val  9.2            Ile  5.7            Leu  6.8            Tyr  1.3            Phe  1.9            His  0.3            Lys  6.9            Arg  4.5            Gly  11.0            Pro  4.4    ______________________________________

The sequence of the NH2 terminus and of two internal fragments of theprotein were determined by means of Edman degradation.

Synthesis of Oligonucleotides:

Two degenerate nucleotide probes of 30 bases were synthesized on thebasis of the peptide sequence of one of the internal fragments describedabove. The sequence of this fragment (SEQ ID NO.12) is:

. . D-D-P-Y-E-N-L-G-A-Q. . . .

The following (SEQ ID NOS. 13-14) deoxyoligonucleotide probes weresynthesized: ##STR3## Cloning of the Gene for the ThermoinducibleProtein HSP18:

After hydridization at 60° C. in 5× SSC, these oligonucleotide probeshave made it possible to characterize and clone a 1.9 kb Xholrestriction fragment of Streptomyces albus (see cloning A in FIG. 1).

This fragment was sequenced; it contains an open reading frame whichextends from an ATG at position 430 to beyond the cloned region thuscoding for a protein of more than 50 kDa, but the NH2 terminus of whichcorresponds to the nucleotide sequence deduced from the peptide sequenceof HSP18. The amino acid sequence corresponding to this gene shows, inaddition, a strong homology throughout its length with the heat-shockprotein groEL of E. coli and the 65 kDa protein from Mycobacteriumleprae (75% homology). Initially, this gene was called "groEL-1".

Demonstration and Cloning of a Second "groEL-like" Gene in Streptomycesalbus:

Hybridizations of the genome of Streptomyces albus were carried outusing the 5' part of the gene for HSP65 of Mycobacterium leprae asprobe; this probe gives two signals after hybridization with the genomeof Streptomyces albus, one strong and one weak, irrespective of theenzyme used to digest the DNA. The weak signal corresponds to thesignals obtained with the oligonucleotides deduced from HSP18, i.e. tothe groEL-1 gene. The strong signal corresponds to a gene coding foranother "GroEL-like" protein of accepted size (65 kDa). There are thustwo groEL-like genes in Streptomyces albus.

Cloning of the Gene for the Heat-Shock Protein HSP65 of Streptomycesalbus:

The 1.2 kb Xhol restriction fragment strongly bound by the HSP65 probefrom Mycobacterium leprae was cloned (cloning C in FIG. 1).

The nucleotide sequence of this fragment was determined. The 1.2 kb Xholfragment codes for an internal fragment of a protein showing 90%homology with the 65 kDa protein from Mycobacterium leprae, in additionthe two "groEL-like" genes 1 and 2 in Streptomyces albus show an 80%homology.

The gene coding for this 65 kDa protein was called groEL-2.

Study of the Transcription of the "groEL-like" Genes and the Search forthe Promoters:

The total RNAs of the Streptomyces albus strain were extracted atvarious times during a heat shock experiment and treated according tothe "Northern blot" technique, then hybridized with variousoligonucleotides, the synthesis of which was based on either the groEL-1sequence or the groEL-2 sequence and which were specific for each of theregions selected in these two sequences. The same nitrocellulose filterswere used in repeat hybridizations with the totality of the two clonedfragments. Three very strongly thermoinducible transcripts are observed;their sizes are about 2500, 2100 and 650 bases, respectively. The onewith 2100 bases corresponds to the groEL-2 transcript, the one with 650bases to the transcript of the gene situated upstream from groEL-1; theone with 2500 bases to the co-transcription of groEL-1 and the genesituated upstream from groEL-1. These results showed, on the one hand,that the two genes groEL-1 and groEL-2 had indeed strong and induciblepromoters, in particular thermoinducible promoters, and, on the other,that the groEL-1 promoter had not been cloned in the 1.9 kb fragment. Inparticular, these results show that none of the RNAs starts at the loopmarked P? in FIG. 1, the sequence postulated as being capable of servingas promoter in Mycobacterium.

In fact, this loop contains two sequences, TTTGCCGGG and TTTCAT, which,in the absence of mapping data for thepromoter, were considered to bethe -35 and -10 regions, respectively, of the promoter for the 65 kDaprotein from Mycobacterium (see, for example, J. Gen. Microbiol. (1989),135, 931-939). These results show that these two sequences do not formpart of the promoter of the groEL-1 gene in Streptomyces albus.

The desired promoter would be expected to be situated upstream from thegene forming an operon with groEL-1. The gene situated upstream fromgroEL-1 has been identified; it is a gene showing strong homology to thegroES gene of E. coli where it also forms an operon with groEL-1 (seeFIG. 5).

Cloning of the Promoter Regions of the Two Genes groEL-1 and groEL-2:

Two novel fragments of Streptomyces albus DNA, hydrolysed by BclI/SacI(1700 bp) and BglII/SAcI (900 bp), were cloned with the aid ofoligonucleotides synthesized starting from the sequence of thepreviously cloned fragments and described above. Hence they partiallyspan the fragment bearing groEL-1 and that bearing groEL-2, respectivelyand extend upstream for about 800 bp in each case (clonings B and D inFIG. 1).

These fragments were sequenced, then the promoters were characterized bymeans of mapping using the S1 nuclease and by primer extension using thereverse transcriptase. The sequences of the promoters of the two genesgroEL-1 and groEL-2 thus characterized are not identical (FIG. 2), butthey show considerable structural homology, in particular both possessthe following palindromic sequence:

    GCACTC 9N GAGTGC

The sequences of the two promoters are the following:

P1 corresponding to one of the sequences (SEQ ID NOS. 2-3): ##STR4##P2corresponding to one of the sequences (SEQ ID NOS. 4-5): ##STR5## Useof the Promoters of groEL-1 and groEL-2 for the Expression of aHeterologous Gene:

These two promoters were used for the expression of the heterologous neogene of the transposon Tn5 of Klebsiella. This gene codes for anaminoglycoside phospho-transferase (APH) which confers resistance toneomycin/kanamycin. This gene was cloned downstream from the twopromoters (FIGS. 3 and 4), then introduced into Streptomyces albus andalso into S. lividans. The neo gene is then strongly expressed, as isshown by the considerable degree of resistance to these antibioticswhich it confers; furthermore, we have been able to visualize thesynthesis of the APH in crude extracts after electrophoresis onpolyacrylamide gel and immuno-blotting with anti-APH antibodies. It mustbe emphasized that these results were obtained in Streptomyces with anintegrating vector. Hence, in these experiments, there is only one copyof the neo gene and of the promoter under study per genome. In fact, inorder to judge the strength of the promoter, it was important not toincrease the expression of neo artificially by increasing the number ofcopies of it by using a vector which generates a large number of copies.

The HindIII-BamHI fragments of pPM1005 and the SmaI-SmaI fragments ofpPM997 have also been inserted into Mycobacterium. The SmaI-SmaIfragment of pPM997 contains the neo gene, the P2 promoter and theterminator.

    __________________________________________________________________________    SEQUENCE LISTING    (1) GENERAL INFORMATION:    (iii) NUMBER OF SEQUENCES: 15    (2) INFORMATION FOR SEQ ID NO:1:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 21 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:    GCACTCNNNNNNNNNGAGTGC21    (2) INFORMATION FOR SEQ ID NO:2:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 52 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:    CATTGGCACTCCGCTTGACCGAGTGCTAATCGCGGTCATAGTCTCAGCTCTG52    (2) INFORMATION FOR SEQ ID NO:3:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 53 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:    GCATTGGCACTCCGCTTGACCGAGTGCTAATCGCGGTCATAGTCTCAGCTCTG53    (2) INFORMATION FOR SEQ ID NO:4:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 52 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:    GGAGGCCCCTAGCGCCTGCACTCTCCTACCCCGAGTGCTATTATTGGCGTTA52    (2) INFORMATION FOR SEQ ID NO:5:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 53 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:    GGAGGCCCCTAGCGCCTGCACTCTCCTACCCCGAGTGCTAATTATTGGCGTTA53    (2) INFORMATION FOR SEQ ID NO:6:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 25 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:    GCACTCNNNNNNNCCGAGTGCTAAT25    (2) INFORMATION FOR SEQ ID NO:7:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 309 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: double    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (ix) FEATURE:    (A) NAME/KEY: CDS    (B) LOCATION: 1..306    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:    GTGACGACCGCCAGCTCCAAGGTTGCCATCAAGCCGCTCGAGGACCGC48    ValThrThrAlaSerSerLysValAlaIleLysProLeuGluAspArg    151015    ATCGTGGTCCAGCCGCTCGACGCCGAGCAGACCACGGCTTCGGGCCTG96    IleValValGlnProLeuAspAlaGluGlnThrThrAlaSerGlyLeu    202530    GTCATCCCGGACACCGCGAAGGAGAAGCCCCAGGAGGGCGTCGTCCTC144    ValIleProAspThrAlaLysGluLysProGlnGluGlyValValLeu    354045    GCGGTCGGCCCGGGCCGCTTCGAGAACGGCGAGCGCCTGCCGCTCGAC192    AlaValGlyProGlyArgPheGluAsnGlyGluArgLeuProLeuAsp    505560    GTCAAGACCGGCGACGTCGTGCTGTACAGCAAGTACGGCGGCACCGAG240    ValLysThrGlyAspValValLeuTyrSerLysTyrGlyGlyThrGlu    65707580    GTCAAGTACAACGGCGAGGAGTACCTCGTCCTCTCGGCCCGCGACGTT288    ValLysTyrAsnGlyGluGluTyrLeuValLeuSerAlaArgAspVal    859095    CTCGCCATCATCGAGAAGTAG309    LeuAlaIleIleGluLys    100    (2) INFORMATION FOR SEQ ID NO:8:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 1320 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: double    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (ix) FEATURE:    (A) NAME/KEY: CDS    (B) LOCATION: 1..1320    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:    ATGGCGAAGATTCTGAAGTTCGACGAGGACGCCCGTCGCGCCCTTGAG48    MetAlaLysIleLeuLysPheAspGluAspAlaArgArgAlaLeuGlu    151015    CGCGGCGTGAACCAGCTGGCCGACACCGTCAAGGTGACCATCGGCCCC96    ArgGlyValAsnGlnLeuAlaAspThrValLysValThrIleGlyPro    202530    AAGGGCCGCAACGTCGTCATCGACAAGAAGTTCGGCGCCCCGACCATC144    LysGlyArgAsnValValIleAspLysLysPheGlyAlaProThrIle    354045    ACCAACGACGGCGTCACCATCGCCCGTGAGGTCGAGTGCGACGACCCG192    ThrAsnAspGlyValThrIleAlaArgGluValGluCysAspAspPro    505560    TACGAGAACCTCGGCGCCCAGCTCGTCAAGGAGGTGGCGACCAAGACC240    TyrGluAsnLeuGlyAlaGlnLeuValLysGluValAlaThrLysThr    65707580    AACGACATCGCGGGTGACGGCACCACCACCGCGACCGTGCTGGCCCAG288    AsnAspIleAlaGlyAspGlyThrThrThrAlaThrValLeuAlaGln    859095    GCGCTGGTCCGCGAGGGCCTGCGCAACGTCGCCGCCGGCGCCTCCCCG336    AlaLeuValArgGluGlyLeuArgAsnValAlaAlaGlyAlaSerPro    100105110    GCCGCCCTGAAGAAGGGCATCGACGCCGCCGTCGCCGCCGTCTCCGCC384    AlaAlaLeuLysLysGlyIleAspAlaAlaValAlaAlaValSerAla    115120125    GAGCTGCTCGACACCGCGCGCCCGATCGACGACAAGTCCGACATCGCC432    GluLeuLeuAspThrAlaArgProIleAspAspLysSerAspIleAla    130135140    GCCGTCGCCGCGCTCTCCGCGCAGGACAAGCAGGTCGGCGAGCTCATC480    AlaValAlaAlaLeuSerAlaGlnAspLysGlnValGlyGluLeuIle    145150155160    GCCGAGGCGATGGACAAGGTCGGCAAGGACGGTGTCATCACCGTCGAG528    AlaGluAlaMetAspLysValGlyLysAspGlyValIleThrValGlu    165170175    GAGTCCAACACCTTCGGTGTCGACCTGGACTTCACCGAGGGCATGGCC576    GluSerAsnThrPheGlyValAspLeuAspPheThrGluGlyMetAla    180185190    TTCGACAAGGGCTACCTGTCCCCGTACATGGTGACCGACCAGGAGCGT624    PheAspLysGlyTyrLeuSerProTyrMetValThrAspGlnGluArg    195200205    ATGGAGGCCGTCCTCGACGACCCGTACATCCTGATCCACCAGGGCAAG672    MetGluAlaValLeuAspAspProTyrIleLeuIleHisGlnGlyLys    210215220    ATCGGTTCGATCCAGGACCTGCTGCCGCTGCTGGAGAAGGTCATCCAG720    IleGlySerIleGlnAspLeuLeuProLeuLeuGluLysValIleGln    225230235240    GCGGGTGGCTCCAAGCCGCTGCTGATCATCGCCGAGGACGTCGAGGGC768    AlaGlyGlySerLysProLeuLeuIleIleAlaGluAspValGluGly    245250255    GAGGCCCTGTCGACCCTGGTGGTCAACAAGATCCGCGGCACGTTCAAC816    GluAlaLeuSerThrLeuValValAsnLysIleArgGlyThrPheAsn    260265270    GCCGTCGCCGTCAAGGCGCCCGGCTTCGGTGACCGCCGCAAGGCGATG864    AlaValAlaValLysAlaProGlyPheGlyAspArgArgLysAlaMet    275280285    CTCGGCGACATGGCCACCCTCACCGGTGCCACCGTCATCGCCGAGGAG912    LeuGlyAspMetAlaThrLeuThrGlyAlaThrValIleAlaGluGlu    290295300    GTCGGCCTCAAGCTCGACCAGGCCGGTCTGGACGTGCTGGGCACCGCC960    ValGlyLeuLysLeuAspGlnAlaGlyLeuAspValLeuGlyThrAla    305310315320    CGCCGCGTCACCGTCACCAAGGACGACACGACCATCGTGGACGGCGGC1008    ArgArgValThrValThrLysAspAspThrThrIleValAspGlyGly    325330335    GGCAACGCCGAGGACGTCCAGGGCCGCGTCGCCCAGATCAAGGCCGAG1056    GlyAsnAlaGluAspValGlnGlyArgValAlaGlnIleLysAlaGlu    340345350    ATCGAGTCGACCGACTCGGACTGGGACCGCGAGAAGCTCCAGGAGCGC1104    IleGluSerThrAspSerAspTrpAspArgGluLysLeuGlnGluArg    355360365    CTCGCCAAGCTGGCCGGCGGCGTCTGCGTGATCCGCGTCGGCGCGGCC1152    LeuAlaLysLeuAlaGlyGlyValCysValIleArgValGlyAlaAla    370375380    ACCGAGGTCGAGCTGAAGGAGCGCAAGCACCGTCTGGAGGACGCCATC1200    ThrGluValGluLeuLysGluArgLysHisArgLeuGluAspAlaIle    385390395400    TCCGCGACCCGCGCCGCGGTCGAGGAGGGCATCGTCTCCGGTGGTGGC1248    SerAlaThrArgAlaAlaValGluGluGlyIleValSerGlyGlyGly    405410415    TCCGCGCTGGTCCACGCCGTCAAGGTCCTGGACGACAACCTCGGCCGC1296    SerAlaLeuValHisAlaValLysValLeuAspAspAsnLeuGlyArg    420425430    ACCGGCGACGAGGCCACCGGTGTC1320    ThrGlyAspGluAlaThrGlyVal    435440    (2) INFORMATION FOR SEQ ID NO:9:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 2167 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:    CCGGCCGGGCTGAGGTTGGCTGGCTGGCCGGGTTCGGCCGGTGGGTCGAGGTGGCCTGGC60    CGGGCTCGCCAGGGTGAGTTGGCCGAGCCGAGGCGGCCCCGGGGCTCCCCGGGCCGAGTT120    GCGCGGCCAGGCCAGGGCTCAGCAGGGTGGGGGAGTGGGGCAGGCGGCCCGGTAGGGGAG180    TGCGGGAGGGCAGCGCGCGCCGCGCGCATTGGCACTCCGCTTGACCGAGTGCTAATCGCG240    GTCATAGTCTCAGCTCTGGCACTCCCCGCAGGAGAGTGCCAACACAGCGACGGGCAGGTC300    CCGGCACCCGCGACGACGGATCGACCTGGTCGCCACACTCAGATCAGTTAACCCCGTGAT360    CTCCGAAGGGGGAGGTCGGATCGTGACGACCGCCAGCTCCAAGGTTGCCATCAAGCCGCT420    CGAGGACCGCATCGTGGTCCAGCCGCTCGACGCCGAGCAGACCACGGCTTCGGGCCTGGT480    CATCCCGGACACCGCGAAGGAGAAGCCCCAGGAGGGCGTCGTCCTCGCGGTCGGCCCGGG540    CCGCTTCGAGAACGGCGAGCGCCTGCCGCTCGACGTCAAGACCGGCGACGTCGTGCTGTA600    CAGCAAGTACGGCGGCACCGAGGTCAAGTACAACGGCGAGGAGTACCTCGTCCTCTCGGC660    CCGCGACGTTCTCGCCATCATCGAGAAGTAGCAGGCCGGAGCGGTCCGGGCGCGAGCCCG720    GACGGCAGACTCCACCTTTTTCCTGAAGCGCGCCCCTGGCCCCCGCGAGTGTTTGCCGGG780    TGGCGAGGGGCGCGTTTCATTTCGAGAGCGCGGCGGCAGGCCGCTCCGAGAGGATTCGAA840    AAGCTCCCATGGCGAAGATTCTGAAGTTCGACGAGGACGCCCGTCGCGCCCTTGAGCGCG900    GCGTGAACCAGCTGGCCGACACCGTCAAGGTGACCATCGGCCCCAAGGGCCGCAACGTCG960    TCATCGACAAGAAGTTCGGCGCCCCGACCATCACCAACGACGGCGTCACCATCGCCCGTG1020    AGGTCGAGTGCGACGACCCGTACGAGAACCTCGGCGCCCAGCTCGTCAAGGAGGTGGCGA1080    CCAAGACCAACGACATCGCGGGTGACGGCACCACCACCGCGACCGTGCTGGCCCAGGCGC1140    TGGTCCGCGAGGGCCTGCGCAACGTCGCCGCCGGCGCCTCCCCGGCCGCCCTGAAGAAGG1200    GCATCGACGCCGCCGTCGCCGCCGTCTCCGCCGAGCTGCTCGACACCGCGCGCCCGATCG1260    ACGACAAGTCCGACATCGCCGCCGTCGCCGCGCTCTCCGCGCAGGACAAGCAGGTCGGCG1320    AGCTCATCGCCGAGGCGATGGACAAGGTCGGCAAGGACGGTGTCATCAACGTCGAGGAGT1380    CCAACACCTTCGGTGTCGACCTGGACTTCACCGAGGGCATGGCCTTCGACAAGGGCTACC1440    TGTCCCCGTACATGGTGACCGACCAGGAGCGTATGGAGGCCGTCCTCGACGACCCGTACA1500    TCCTGATCCACCAGGGCAAGATCGGTTCGATCCAGGACCTGCTGCCGCTGCTGGAGAAGG1560    TCATCCAGGCGGGTGGCTCCAAGCCGCTGCTGATCATCGCCGAGGACGTCGAGGGCGAGG1620    CCCTGTCGACCCTGGTGGTCAACAAGATCCGCGGCACGTTCAACGCCGTCGCCGTCAAGG1680    CGCCCGGCTTCGGTGACCGCCGCAAGGCGATGCTCGGCGACATGGCCACCCTCACCGGTG1740    CCACCGTCATCGCCGAGGAGGTCGGCCTCAAGCTCGACCAGGCCGGTCTGGACGTGCTGG1800    GCACCGCCCGCCGCGTCACCGTCACCAAGGACGACACGACCATCGTGGACCTGGAGAAGG1860    ACGCCGAGGACGTCCAGGGCCGCGTCGCCCAGATCAAGGCCGAGATCGAGTCGACCGACT1920    CGGACTGGGACCGCGAGAAGCTCCAGGAGCGCCTCGCCAAGCTGGCCGGCGGCGTCTGCG1980    TGATCCGCGTCGGCGCGGCCACCGAGGTCGAGCTGAAGGAGCGCAAGCACCGTCTGGAGG2040    ACGCCATCTCCGCGACCCGCGCCGCGGTCGAGGAGGGCATCGTCTCCGGTGGTGGCTCCG2100    CGCTGGTCCACGCCGTCAAGGTCCTGGACGACAACCTCGGCCGCACCGGCGACGAGGCCA2160    CCGGTGT2167    (2) INFORMATION FOR SEQ ID NO:10:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 1620 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: double    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (ix) FEATURE:    (A) NAME/KEY: CDS    (B) LOCATION: 1..1620    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:    ATGGCGAAGATTCTGAAGTTCGACGAGGACGCCCGTCGCGCCCTTGAG48    MetAlaLysIleLeuLysPheAspGluAspAlaArgArgAlaLeuGlu    151015    CGCGGCGTGAACCAGCTGGCCGACACCGTCAAGGTGACCATCGGCCCC96    ArgGlyValAsnGlnLeuAlaAspThrValLysValThrIleGlyPro    202530    AAGGGCCGCAACGTCGTCATCGACAAGAAGTTCGGCGCCCCGACCATC144    LysGlyArgAsnValValIleAspLysLysPheGlyAlaProThrIle    354045    ACCAACGACGGCGTCACCATCGCCCGTGAGGTCGAGTGCGACGACCCG192    ThrAsnAspGlyValThrIleAlaArgGluValGluCysAspAspPro    505560    TACGAGAACCTCGGCGCCCAGCTCGTCAAGGAGGTGGCGACCAAGACC240    TyrGluAsnLeuGlyAlaGlnLeuValLysGluValAlaThrLysThr    65707580    AACGACATCGCGGGTGACGGCACCACCACCGCGACCGTGCTGGCCCAG288    AsnAspIleAlaGlyAspGlyThrThrThrAlaThrValLeuAlaGln    859095    GCGCTGGTCCGCGAGGGCCTGCGCAACGTCGCCGCCGGCGCCTCCCCG336    AlaLeuValArgGluGlyLeuArgAsnValAlaAlaGlyAlaSerPro    100105110    GCCGCCCTGAAGAAGGGCATCGACGCCGCCGTCGCCGCCGTCTCCGCC384    AlaAlaLeuLysLysGlyIleAspAlaAlaValAlaAlaValSerAla    115120125    GAGCTGCTCGACACCGCGCGCCCGATCGACGACAAGTCCGACATCGCC432    GluLeuLeuAspThrAlaArgProIleAspAspLysSerAspIleAla    130135140    GCCGTCGCCGCGCTCTCCGCGCAGGACAAGCAGGTCGGCGAGCTCATC480    AlaValAlaAlaLeuSerAlaGlnAspLysGlnValGlyGluLeuIle    145150155160    GCCGAGGCGATGGACAAGGTCGGCAAGGACGGTGTCATCACCGTCGAG528    AlaGluAlaMetAspLysValGlyLysAspGlyValIleThrValGlu    165170175    GAGTCCAACACCTTCGGTGTCGACCTGGACTTCACCGAGGGCATGGCC576    GluSerAsnThrPheGlyValAspLeuAspPheThrGluGlyMetAla    180185190    TTCGACAAGGGCTACCTGTCCCCGTACATGGTGACCGACCAGGAGCGT624    PheAspLysGlyTyrLeuSerProTyrMetValThrAspGlnGluArg    195200205    ATGGAGGCCGTCCTCGACGACCCGTACATCCTGATCCACCAGGGCAAG672    MetGluAlaValLeuAspAspProTyrIleLeuIleHisGlnGlyLys    210215220    ATCGGTTCGATCCAGGACCTGCTGCCGCTGCTGGAGAAGGTCATCCAG720    IleGlySerIleGlnAspLeuLeuProLeuLeuGluLysValIleGln    225230235240    GCGGGTGGCTCCAAGCCGCTGCTGATCATCGCCGAGGACGTCGAGGGC768    AlaGlyGlySerLysProLeuLeuIleIleAlaGluAspValGluGly    245250255    GAGGCCCTGTCGACCCTGGTGGTCAACAAGATCCGCGGCACGTTCAAC816    GluAlaLeuSerThrLeuValValAsnLysIleArgGlyThrPheAsn    260265270    GCCGTCGCCGTCAAGGCGCCCGGCTTCGGTGACCGCCGCAAGGCGATG864    AlaValAlaValLysAlaProGlyPheGlyAspArgArgLysAlaMet    275280285    CTCGGCGACATGGCCACCCTCACCGGTGCCACCGTCATCGCCGAGGAG912    LeuGlyAspMetAlaThrLeuThrGlyAlaThrValIleAlaGluGlu    290295300    GTCGGCCTCAAGCTCGACCAGGCCGGTCTGGACGTGCTGGGCACCGCC960    ValGlyLeuLysLeuAspGlnAlaGlyLeuAspValLeuGlyThrAla    305310315320    CGCCGCGTCACCGTCACCAAGGACGACACGACCATCGTGGACGGCGGC1008    ArgArgValThrValThrLysAspAspThrThrIleValAspGlyGly    325330335    GGCAACGCCGAGGACGTCCAGGGCCGCGTCGCCCAGATCAAGGCCGAG1056    GlyAsnAlaGluAspValGlnGlyArgValAlaGlnIleLysAlaGlu    340345350    ATCGAGTCGACCGACTCGGACTGGGACCGCGAGAAGCTCCAGGAGCGC1104    IleGluSerThrAspSerAspTrpAspArgGluLysLeuGlnGluArg    355360365    CTCGCCAAGCTGGCCGGCGGCGTCTGCGTGATCCGCGTCGGCGCGGCC1152    LeuAlaLysLeuAlaGlyGlyValCysValIleArgValGlyAlaAla    370375380    ACCGAGGTCGAGCTGAAGGAGCGCAAGCACCGTCTGGAGGACGCCATC1200    ThrGluValGluLeuLysGluArgLysHisArgLeuGluAspAlaIle    385390395400    TCCGCGACCCGCGCCGCGGTCGAGGAGGGCATCGTCTCCGGTGGTGGC1248    SerAlaThrArgAlaAlaValGluGluGlyIleValSerGlyGlyGly    405410415    TCCGCGCTGGTCCACGCCGTCAAGGTCCTGGACGACAACCTCGGCCGC1296    SerAlaLeuValHisAlaValLysValLeuAspAspAsnLeuGlyArg    420425430    ACCGGCGACGAGGCCACCGGTGTCGCGGTCGTCCGCCGCGCCGCCGTC1344    ThrGlyAspGluAlaThrGlyValAlaValValArgArgAlaAlaVal    435440445    GAGCCGCTGCGCTGGATCGCCGAGAACGCCGGCCTCGAGGGCTACGTC1392    GluProLeuArgTrpIleAlaGluAsnAlaGlyLeuGluGlyTyrVal    450455460    ATCACCACCAAGGTGGCGGAGCTCGACAAGGGCCAGGGCTTCAACGCG1440    IleThrThrLysValAlaGluLeuAspLysGlyGlnGlyPheAsnAla    465470475480    GCCACCGGCGAGTACGGCGACCTGGTCAAGGCCGGCGTCATCGACCCG1488    AlaThrGlyGluTyrGlyAspLeuValLysAlaGlyValIleAspPro    485490495    GTCAAGGTCACCGCGTCCGCCCTGGAGAACGCGGCCTCCATCGCCTCC1536    ValLysValThrAlaSerAlaLeuGluAsnAlaAlaSerIleAlaSer    500505510    CTGCTCCTGACGACCGAGACCCTGGTCGTCGAGAAGCCGGCCGAGGAG1584    LeuLeuLeuThrThrGluThrLeuValValGluLysProAlaGluGlu    515520525    GAGCCCGAGGCCGGTCACGGTCACGGGCACAGCCAC1620    GluProGluAlaGlyHisGlyHisGlyHisSerHis    530535540    (2) INFORMATION FOR SEQ ID NO:11:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 2668 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:    CCGGCCGGGCTGAGGTTGGCTGGCTGGCCGGGTTCGGCCGGTGGGTCGAGGTGGCCTGGC60    CGGGCTCGCCAGGGTGAGTTGGCCGAGCCGAGGCGGCCCCGGGGCTCCCCGGGCCGAGTT120    GGCGCGGCCAGGCCAGGGCTCAGCAGGGTGGGGGAGTGGGGCAGGCGGCCCGGTAGGGGA180    GTGCGGGAGGGCAGCGCGCGCCGCGCGCATTGGCACTCCGCTTGACCGAGTGCTAATCGC240    GGTCATAGTCTCAGCTCTGGCACTCCCCGCAGGAGACTGCCAACACAGCGACGGGCAGGT300    CCGGCACCCGCGACGACGGATCGACCTGGTCGCCACACTCAGATCAGTTAACCCCGTGAT360    CTCCGAAGGGGGAGGTCGGATCGTGACGACCGCCAGCTCCAAGGTTGCCATCAAGCCGCT420    CGAGGACCGCATCGTGGTCCAGCCGCTCGACGCCGAGCAGACCACGGCTTCGGGCCTGGT480    CATCCCGGACACCGCGAAGGAGAAGCCCCAGGAGGGCGTCGTCCTCGCGGTCGGCCCGGG540    CCGCTTCGAGAACGGCGAGCGCCTGCCGCTCGACGTCAAGACCGGCGACGTCGTGCTGTA600    CAGCAAGTACGGCGGCACCGAGGTCAAGTACAACGGCGAGGAGTACCTCGTCCTCTCGGC660    CCGCGACGTTCTCGCCATCATCGAGAAGTAGCAGGCCGGAGCGGTCCGGGCGCGAGCCCG720    GACGGCAGACTCCACCTTTTTCCTGAAGCGCGCCCCTGGCCCCCGCGAGTGTTTGCCGGG780    TGGCGAGGGGCGCGTTTCATTTCGAGAGCGCGGCGGCAGGCCGCTCCGAGAGGATTCGAA840    AAGCTCCCATGGCGAAGATTCTGAAGTTCGACGAGGACGCCCGTCGCGCCCTTGAGCGCG900    GCGTGAACCAGCTGGCCGACACCGTCAAGGTGACCATCGGCCCCAAGGGCCGCAACGTCG960    TCATCGACAAGAAGTTCGGCGCCCCGACCATCACCAACGACGGCGTCACCATCGCCCGTG1020    AGGTCGAGTGCGACGACCCGTACGAGAACCTCGGCGCCCAGCTCGTCAAGGAGGTGGCGA1080    CCAAGACCAACGACATCGCGGGTGACGGCACCACCACCGCGACCGTGCTGGCCCAGGCGC1140    TGGTCCGCGAGGGCCTGCGCAACGTCGCCGCCGGCGCCTCCCCGGCCGCCCTGAAGAAGG1200    GCATCGACGCCGCCGTCGCCGCCGTCTCCGCCGAGCTGCTCGACACCGCGCGCCCGATCG1260    ACGACAAGTCCGACATCGCCGCCGTCGCCGCGCTCTCCGCGCAGGACAAGCAGGTCGGCG1320    AGCTCATCGCCGAGGCGATGGACAAGGTCGGCAAGGACGGTGTCATCAACGTCGAGGAGT1380    CCAACACCTTCGGTGTCGACCTGGACTTCACCGAGGGCATGGCCTTCGACAAGGGCTACC1440    TGTCCCCGTACATGGTGACCGACCAGGAGCGTATGGAGGCCGTCCTCGACGACCCGTACA1500    TCCTGATCCACCAGGGCAAGATCGGTTCGATCCAGGACCTGCTGCCGCTGCTGGAGAAGG1560    TCATCCAGGCGGGTGGCTCCAAGCCGCTGCTGATCATCGCCGAGGACGTCGAGGGCGAGG1620    CCCTGTCGACCCTGGTGGTCAACAAGATCCGCGGCACGTTCAACGCCGTCGCCGTCAAGG1680    CGCCCGGCTTCGGTGACCGCCGCAAGGCGATGCTCGGCGACATGGCCACCCTCACCGGTG1740    CCACCGTCATCGCCGAGGAGGTCGGCCTCAAGCTCGACCAGGCCGGTCTGGACGTGCTGG1800    GCACCGCCCGCCGCGTCACCGTCACCAAGGACGACACGACCATCGTGGACCTGGAGAAGG1860    ACGCCGAGGACGTCCAGGGCCGCGTCGCCCAGATCAAGGCCGAGATCGAGTCGACCGACT1920    CGGACTGGGACCGCGAGAAGCTCCAGGAGCGCCTCGCCAAGCTGGCCGGCGGCGTCTGCG1980    TGATCCGCGTCGGCGCGGCCACCGAGGTCGAGCTGAAGGAGCGCAAGCACCGTCTGGAGG2040    ACGCCATCTCCGCGACCCGCGCCGCGGTCGAGGAGGGCATCGTCTCCGGTGGTGGCTCCG2100    CGCTGGTCCACGCCGTCAAGGTCCTGGACGACAACCTCGGCCGCACCGGCGACGAGGCCA2160    CCGGTGTCGCGGTCGTCCGCCGCGCCGCCGTCGAGCCGCTGCGCTGGATCGCCGAGAACG2220    CCGGCCTCGAGGGCTACGTCATCACCACCAAGGTGGCGGAGCTCGACAAGGGCCAGGGCT2280    TCAACGCGGCCACCGGCGAGTACGGCGACCTGGTCAAGGCCGGCGTCATCGACCCGGTCA2340    AGGTCACCCGCTCCGCCCTGGAGAACGCGGCCTCCATCGCCTCCCTGCTCCTGACGACCG2400    AGACCCTGGTCGTCGAGAAGCCGGCCGAGGAGGAGCCCGAGGCCGGTCACGGTCACGGGC2460    ACAGCCACTGAGGCTGACCCCTTCCGCAGCCGAGGCCCGGCTCCCCGTCGCGGGGAGCCG2520    GGCCTCCGGCGTGTCCGGGACCCCCCGGGACGCGCGACGCCTACCGCGGCCCGTACTTGC2580    GGCCGGTACGCGAGGTCATCCCGGTCAGCAGGGCCCGCGGGGTCAGCTTCACCAGGCCCA2640    TCAGCGCCTTGTACCGAGGGTCCGGGAT2668    (2) INFORMATION FOR SEQ ID NO:12:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 10 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: peptide    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:    AspAspProTyrGluAsnLeuGlyAlaGln    1510    (2) INFORMATION FOR SEQ ID NO:13:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 29 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (ix) FEATURE:    (A) NAME/KEY: misc.sub.-- feature    (B) LOCATION: 9    (D) OTHER INFORMATION: /note= "Nucleotide 9 wherein S is C    or G."    (ix) FEATURE:    (A) NAME/KEY: misc.sub.-- feature    (B) LOCATION: 21    (D) OTHER INFORMATION: /note= "Nucleotide 21 wherein S is    C or G."    (ix) FEATURE:    (A) NAME/KEY: misc.sub.-- feature    (B) LOCATION: 27    (D) OTHER INFORMATION: /note= "Nucleotide 27 wherein Y is    C or T."    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:    GACGACCCSTACGAGAACCTSGGCGCYCA29    (2) INFORMATION FOR SEQ ID NO:14:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 30 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (ix) FEATURE:    (A) NAME/KEY: misc.sub.-- feature    (B) LOCATION: 9    (D) OTHER INFORMATION: /note= "Nucleotide 9 wherein S is C    or G."    (ix) FEATURE:    (A) NAME/KEY: misc.sub.-- feature    (B) LOCATION: 21    (D) OTHER INFORMATION: /note= "Nucleotide 21 wherein S is    C or G."    (ix) FEATURE:    (A) NAME/KEY: misc.sub.-- feature    (B) LOCATION: 27    (D) OTHER INFORMATION: /note= "Nucleotide 27 wherein R is    G or A."    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14:    CTGCTGGGSATGCTCTTGGASCCGCGRGTC30    (2) INFORMATION FOR SEQ ID NO:15:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 8 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: peptide    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:15:    GlyHisGlyHisGlyHisSerHis    15    __________________________________________________________________________

We claim:
 1. A recombinant nucleotide sequence comprising:a regulatorysequence for initiation of transcription, this regulatory sequenceincluding a promoter contained in the SmaI-SmaI fragment of pPM1005 orin the BgIII-SstI fragment of pPM997, operably linked to; a sequencecoding for a heterologous polypeptide different from that naturallyassociated with said promoter, wherein said coding sequence ispositioned downstream from said regulatory sequence for initiation oftranscription at a site which, under suitable conditions, allows thepolypeptide to be expressed under the control of said promoter.
 2. Therecombinant nucleotide sequence according to claim 1, wherein thepromoter comprises a GCACTC 9N GAGTGC (SEQ ID NO: 1) sequence.
 3. Anexpression vector including a recombinant nucleotide sequence accordingto claim 1 which expresses the polypeptide.
 4. An expression vectoraccording to claim 3 which is a plasmid.
 5. A heat shock polypeptidecomprising:either (i) one of the amino acid sequences (SEQ ID NOs:8 and10) shown below; or (ii) a part of this sequence (i), said partcomprising the NH₂ terminus and said polypeptide having a molecularweight of about 18 kDa and an isoelectric point of about 9:SEQ ID NO: 8:##STR6## and SEQ ID NO: 10: ##STR7##
 6. The polypeptide according toclaim 5, comprising an amino acid sequence which extends maximally fromamino acid number 1 to amino acid number
 170. 7. A nucleic acid sequencecoding for a heat shock protein of claim 5 or
 6. 8. The recombinantnucleotide sequence according to claim 1, wherein said regulatorysequence is present in Streptomyces.
 9. A cell transformed by theexpression vector according to claims 3 or 4, which recognizes saidpromoter and expresses said polypeptide.
 10. A process for theproduction of a polypeptide comprising the steps of:transforming a cellwith the expression vector according to claims 3 or 4 under conditionswhich allow expression of said polypeptide, said cell being capable ofrecognizing said promoter; and recovering the polypeptide expressed. 11.The process according to claim 10, wherein said cell is of genusMycobacterium.
 12. An isolated nucleotide sequence selected from thefollowing sequences:P1 corresponding to one of the sequences (SEQ IDNos: 2-3): ##STR8## P2 corresponding to one of the sequences (SEQ IDNos. 4-5): ##STR9##
 13. An 18 kD heat shock protein isolated fromStreptomyces albus and having an isoelectric point higher than
 9. 14. Acell of the genus Streptomyces transformed by the expression vectoraccording to claims 3 or 4, which recognizes said promoter and expressessaid polypeptide.
 15. A process for the production of a polypeptidecomprising the steps of:transforming a cell of the genus Streptomyces bymeans of an expression vector according to claims 3 or 4 underconditions which allow expression of said polypeptide, said cell beingcapable of recognizing said promoter; and recovering the polypeptideexpressed.
 16. The process according to claim 15, wherein the conditionsallowing the expression of said polypeptide include application of aheat shock.